Skip to content

Conversation

dsandersllvm
Copy link
Collaborator

@dsandersllvm dsandersllvm commented Sep 5, 2025

Some targets like PowerPC store their vector elements in an endian-dependent order while others use the same order regardless of endianness:

  • PowerPC little endian: little endian elements ordered 0, 1, 2, ...
  • PowerPC big endian: big endian elements ordered n-1, n-2, n-3, ...
  • ARM/MIPS little endian: little endian elements ordered 0, 1, 2, ...
  • ARM/MIPS big endian: big endian elements ordered 0, 1, 2, ...

This matters when LLVM-IR values are transferred to/from target memory since LLVM-IR orders elements 0, 1, 2, ... regardless of endianness.

This will be used in #155000 by changes to the IRInterpreter to allow it to evaluate some vectors without executing on the target

Some targets like PowerPC store their
- PowerPC little endian:  little endian elements ordered 0, 1, 2, ...
- PowerPC big endian:     big endian elements ordered n-1, n-2, n-3, ...
- ARM/MIPS little endian: little endian elements ordered 0, 1, 2, ...
- ARM/MIPS big endian:    big endian elements ordered 0, 1, 2, ...

This matters when LLVM-IR values are transferred to/from target memory since
LLVM-IR orders elements 0, 1, 2, ... regardless of endianness.

This will be used in llvm#155000 by changes to the IRInterpreter to allow it to
evaluate some vectors without executing on the target
@llvmbot
Copy link
Member

llvmbot commented Sep 5, 2025

@llvm/pr-subscribers-lldb

Author: Daniel Sanders (dsandersllvm)

Changes

Some targets like PowerPC store their

  • PowerPC little endian: little endian elements ordered 0, 1, 2, ...
  • PowerPC big endian: big endian elements ordered n-1, n-2, n-3, ...
  • ARM/MIPS little endian: little endian elements ordered 0, 1, 2, ...
  • ARM/MIPS big endian: big endian elements ordered 0, 1, 2, ...

This matters when LLVM-IR values are transferred to/from target memory since LLVM-IR orders elements 0, 1, 2, ... regardless of endianness.

This will be used in #155000 by changes to the IRInterpreter to allow it to evaluate some vectors without executing on the target


Full diff: https://github.com/llvm/llvm-project/pull/157198.diff

3 Files Affected:

  • (modified) lldb/include/lldb/Core/Architecture.h (+11)
  • (modified) lldb/source/Plugins/Architecture/PPC64/ArchitecturePPC64.cpp (+6-1)
  • (modified) lldb/source/Plugins/Architecture/PPC64/ArchitecturePPC64.h (+6-1)
diff --git a/lldb/include/lldb/Core/Architecture.h b/lldb/include/lldb/Core/Architecture.h
index b6fc1a20e1e69..f039d05fe00fa 100644
--- a/lldb/include/lldb/Core/Architecture.h
+++ b/lldb/include/lldb/Core/Architecture.h
@@ -129,6 +129,17 @@ class Architecture : public PluginInterface {
                                        RegisterContext &reg_context) const {
     return false;
   }
+
+  /// Get the vector element order for this architecture. This determines how
+  /// vector elements are indexed. This matters in a few places such as reading/
+  /// writing LLVM-IR values to/from target memory. Some architectures use
+  /// little-endian element ordering where element 0 is at the lowest address
+  /// even when the architecture is otherwise big-endian (e.g. MIPS MSA, ARM
+  /// NEON), but some architectures like PowerPC may use big-endian element
+  /// ordering where element 0 is at the highest address.
+  virtual lldb::ByteOrder GetVectorElementOrder() const {
+    return lldb::eByteOrderLittle;
+  }
 };
 
 } // namespace lldb_private
diff --git a/lldb/source/Plugins/Architecture/PPC64/ArchitecturePPC64.cpp b/lldb/source/Plugins/Architecture/PPC64/ArchitecturePPC64.cpp
index b8fac55e41da7..a4690cc561a28 100644
--- a/lldb/source/Plugins/Architecture/PPC64/ArchitecturePPC64.cpp
+++ b/lldb/source/Plugins/Architecture/PPC64/ArchitecturePPC64.cpp
@@ -35,7 +35,8 @@ void ArchitecturePPC64::Terminate() {
 std::unique_ptr<Architecture> ArchitecturePPC64::Create(const ArchSpec &arch) {
   if (arch.GetTriple().isPPC64() &&
       arch.GetTriple().getObjectFormat() == llvm::Triple::ObjectFormatType::ELF)
-    return std::unique_ptr<Architecture>(new ArchitecturePPC64());
+    return std::unique_ptr<Architecture>(
+        new ArchitecturePPC64(arch.GetByteOrder()));
   return nullptr;
 }
 
@@ -60,3 +61,7 @@ void ArchitecturePPC64::AdjustBreakpointAddress(const Symbol &func,
 
   addr.SetOffset(addr.GetOffset() + loffs);
 }
+
+lldb::ByteOrder ArchitecturePPC64::GetVectorElementOrder() const {
+  return m_vector_element_order;
+}
diff --git a/lldb/source/Plugins/Architecture/PPC64/ArchitecturePPC64.h b/lldb/source/Plugins/Architecture/PPC64/ArchitecturePPC64.h
index 80f7f27b54cce..9a0edf371d539 100644
--- a/lldb/source/Plugins/Architecture/PPC64/ArchitecturePPC64.h
+++ b/lldb/source/Plugins/Architecture/PPC64/ArchitecturePPC64.h
@@ -30,9 +30,14 @@ class ArchitecturePPC64 : public Architecture {
   void AdjustBreakpointAddress(const Symbol &func,
                                Address &addr) const override;
 
+  lldb::ByteOrder GetVectorElementOrder() const override;
+
 private:
   static std::unique_ptr<Architecture> Create(const ArchSpec &arch);
-  ArchitecturePPC64() = default;
+  ArchitecturePPC64(lldb::ByteOrder vector_element_order)
+      : m_vector_element_order(vector_element_order) {}
+
+  lldb::ByteOrder m_vector_element_order;
 };
 
 } // namespace lldb_private

@Michael137 Michael137 changed the title [lldb] Architecture plugins should report the vector element order. NFC [lldb][NFC] Architecture plugins should report the vector element order Sep 7, 2025
@Michael137
Copy link
Member

Some targets like PowerPC store their

Is there some text missing here?

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Sep 8, 2025

I can confirm that AArch64 Neon and SVE work as stated:

B2.9.3.2 Endianness in SIMD operations

...The four elements appear in the register in array order, with the
lowest indexed element fetched from the lowest address. The order of bytes in the elements depends on the endianness
configuration, as shown in Figure B2-3. Therefore, the order of the elements in the registers is the same regardless of the
endianness configuration
.

B2.9.3.3 Endianness in SVE operations

Rules on byte and element order of SIMD load and store instructions apply to SVE load and store instructions.

Do we know what vector extension s390x/SystemZ has and what order it uses?

@dsandersllvm
Copy link
Collaborator Author

Some targets like PowerPC store their

Is there some text missing here?

Oops, fixed it.

Do we know what vector extension s390x/SystemZ has and what order it uses?

I don't know but I've found bytecodealliance/wasmtime#4566 so @uweigand might be able to confirm. The first section seems to be describing the ARM/MIPS behaviour where the element order in the ISA remains the same regardless of the endianness of the elements themselves . I don't see any of the required shuffling bitcasts in lib/Target/SystemZ for that behaviour though. That thread goes on to mention implementing all four combinations so it might be configurable

@uweigand
Copy link
Member

As I described in the wasmtime issue linked above, there is both an ISA and an ABI issue here. From an ISA perspective, SystemZ has vector registers, and instructions that operate on numbered lanes of the vector register if interpreted in a particular type (like v4i32). These lane numbers are in some cases provided in a register operand (e.g. permute), in some cases provided an immediate operand (e.g. insert/extract element), and in some cases implied (e.g. "even" vs. "odd" lanes).

Now, the default vector load/store instructions on SystemZ operate in a fashion where those lane numbers correspond to array indices in memory. That is, if you load an array x of four i32 values using a VECTOR LOAD, then lane number i holds the element x[i]. In this respect, SystemZ -when using these default load/store instructions, which the SystemZ LLVM backend always does- behaves just like Intel or ARM.

However, given the big-endian nature of the platform, the behavior is still different from Intel or ARM if you then re-interpret that same vector register in a different type. In the example above, if you were to re-interpret this register loaded from an array of four i32 values as as v8i16 vector type, then element 0 holds the high part of x[0] and element 1 holds the low part of x[0] - while on Intel or ARM, this would be reversed, simply because the value x[0] is little-endian on those platforms while it is big-endian on SystemZ.

For wasmtime, this was a problem as this violates an assumption of WebAssembly semantics, where the little-endian behavior is required. In order to support this on SystemZ, we use the trick of loading vector values from memory in reverse element order, so that in the above example, you would find x[0] in vector element 3 rather than 0. This makes the machine then "look" little-endian again when re-interpreting the vector register in another type. To implement that, the wasmtime compiler will not use the standard VECTOR LOAD instruction, but rather a VECTOR LOAD ELEMENTS REVERSED instruction. Similarly, for all other operations that use lane numbers, the compiler will emit code that inverts the number - either at compile time (for immediates) or, where necessary, at run time (e.g. for permutes).

In effect, this creates a variant ABI where vectors are always held in reverse order in vector registers. However, this is not used by LLVM at all - only WebAssembly requires this variant ABI.

So in a nutshell, as far as LLVM is concerned, SystemZ behaves the same as Intel or ARM w.r.t. lane numbers.

@dsandersllvm
Copy link
Collaborator Author

Thanks. I think the overall conclusion w.r.t this PR is that the default element-order value is ok for SystemZ for both Webassembly and C++.

For C++:
In the case where the user is typing print my_variable and the variable is a <4 x i32>, we'll generate IR with a load <4 x i32>, ptr @my_variable in it. The path that evaluates by injecting code into the process will generate a load that reads elements in the same order as arrays, and the path that evaluates the IR and reads memory directly will do the same. If the user instead does print my_variable[0] then the code injection path will extract lane 0 from the vector which was the first element of the array in memory and the IR evaluation path will just read the first element of the array directly from memory.

In the case where the user types something like print (v8i16)my_variable, both paths will get a vector where the first element is the high half of lane 0, the second is the low half of lane 0, and so on which is also fine since both evaluation paths give the same outcome.

For Webassembly we get the right outcome for different reasons. I'd expect the language plugin to correctly use the VECTOR LOAD ELEMENTS REVERSED instruction when generating the IR for SystemZ. Then when that's lowered to LLVM-IR, the IR would correctly account for the element endianness such that print (v8i16)my_variable would print the low half of lane 0, the high half of lane 0, and so on.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is required for the parent PR and that PR is eventually approved, this LGTM.

Please add System Z to the listed systems in the PR description. That detail may come in handy one day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants